Skip to content

Improve detection of an idle indexer #6829

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Jul 15, 2025

Conversation

labkey-jeckels
Copy link
Contributor

@labkey-jeckels labkey-jeckels commented Jul 8, 2025

Rationale

We see intermittent failures when automated tests delete containers while indexing is still running. We need to check both Item and Runnable queues. An example is Issue 53417.

Changes

  • Queue a Runnable and when it completes, queue the Item

{
// The indexer uses multiple threads for different types of work. Queue a Runnable first, and when it executes,
// queue the Item
SearchService.IndexTask itemTask = createTask("WaitForIndexer", new SearchService.TaskListener()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks fine. It's a little more complicated than necessary. I think you can use the one task=createTask(). It will fire success when all the associated items are done. So the initial addRunnable(), and the second addNoop() can be run using the same task.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With a single task, is there a way to guarantee that the Runnable finishes before the Item goes into the queue, and that you get a single notification at the end of both of them?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. As long as the Item gets added before the Runnable finishes, we'll get a single success notification when the Item completes.

@labkey-chrisj
Copy link
Contributor

After running locally, I ended up with several of these in my server log:

`ERROR AbstractSearchService    2025-07-10T16:52:44,699     SearchService:runner : Error running org.labkey.experiment.api.ExpMaterialTableImpl$$Lambda$2809/0x00000147eb740af0@19f42c49
org.labkey.api.exceptions.TableNotFoundException: Table not found (deleted? race condition?): expsampleset.c14d208_generated_sample_type_1
	at org.labkey.experiment.api.property.StorageProvisionerImpl.getSchemaTableInfo(StorageProvisionerImpl.java:589) ~[experiment-25.7-SNAPSHOT.jar:?]
	at org.labkey.experiment.api.property.StorageProvisionerImpl.getSchemaTableInfo(StorageProvisionerImpl.java:1074) ~[experiment-25.7-SNAPSHOT.jar:?]
	at org.labkey.experiment.api.property.StorageProvisionerImpl.createTableInfoImpl(StorageProvisionerImpl.java:576) ~[experiment-25.7-SNAPSHOT.jar:?]
	at org.labkey.api.exp.api.StorageProvisioner.createTableInfo(StorageProvisioner.java:68) ~[api-25.7-SNAPSHOT.jar:?]
	at org.labkey.experiment.api.ExpSampleTypeImpl.getTinfo(ExpSampleTypeImpl.java:950) ~[experiment-25.7-SNAPSHOT.jar:?]
	at org.labkey.experiment.api.ExpMaterialTableImpl.getFromSQLExpanded(ExpMaterialTableImpl.java:1046) ~[experiment-25.7-SNAPSHOT.jar:?]
	at org.labkey.api.data.AbstractTableInfo.getFromSQL(AbstractTableInfo.java:407) ~[api-25.7-SNAPSHOT.jar:?]
	at org.labkey.query.sql.QuerySelectView.getSelectSQL(QuerySelectView.java:341) ~[query-25.7-SNAPSHOT.jar:?]
	at org.labkey.query.sql.QuerySelectView.getSql(QuerySelectView.java:163) ~[query-25.7-SNAPSHOT.jar:?]
	at org.labkey.query.QueryServiceImpl$SelectBuilderImpl.buildSqlFragment(QueryServiceImpl.java:2863) ~[query-25.7-SNAPSHOT.jar:?]
	at org.labkey.query.QueryServiceImpl.getSelectSQL(QueryServiceImpl.java:2739) ~[query-25.7-SNAPSHOT.jar:?]
	at org.labkey.api.data.TableSelector$TableSqlFactory.getSql(TableSelector.java:644) ~[api-25.7-SNAPSHOT.jar:?]
	at org.labkey.api.data.SqlExecutingSelector$ExecutingResultSetFactory.handleResultSet(SqlExecutingSelector.java:379) ~[api-25.7-SNAPSHOT.jar:?]
	at org.labkey.api.data.SqlExecutingSelector.getResultSet(SqlExecutingSelector.java:191) ~[api-25.7-SNAPSHOT.jar:?]
	at org.labkey.api.data.TableSelector.getResults(TableSelector.java:372) ~[api-25.7-SNAPSHOT.jar:?]
	at org.labkey.api.data.TableSelector.getResults(TableSelector.java:355) ~[api-25.7-SNAPSHOT.jar:?]
	at org.labkey.api.data.TableSelector.getResults(TableSelector.java:344) ~[api-25.7-SNAPSHOT.jar:?]
	at org.labkey.experiment.api.AbstractRunItemImpl.processIndexValues(AbstractRunItemImpl.java:376) ~[experiment-25.7-SNAPSHOT.jar:?]
	at org.labkey.experiment.api.ExpMaterialImpl.getCustomIndexValues(ExpMaterialImpl.java:474) ~[experiment-25.7-SNAPSHOT.jar:?]
	at org.labkey.experiment.api.ExpMaterialImpl.createIndexDocument(ExpMaterialImpl.java:407) ~[experiment-25.7-SNAPSHOT.jar:?]
	at org.labkey.api.exp.api.ExpSearchable.lambda$index$0(ExpSearchable.java:41) ~[api-25.7-SNAPSHOT.jar:?]
	at org.labkey.api.data.DbScope._executeWithRetry(DbScope.java:1030) ~[api-25.7-SNAPSHOT.jar:?]
	at org.labkey.api.data.DbScope.executeWithRetryReadOnly(DbScope.java:984) ~[api-25.7-SNAPSHOT.jar:?]
	at org.labkey.api.exp.api.ExpSearchable.index(ExpSearchable.java:41) ~[api-25.7-SNAPSHOT.jar:?]
	at org.labkey.experiment.api.ExpMaterialTableImpl.lambda$persistRows$5(ExpMaterialTableImpl.java:1614) ~[experiment-25.7-SNAPSHOT.jar:?]
	at org.labkey.search.model.AbstractSearchService.lambda$new$6(AbstractSearchService.java:985) ~[search-25.7-SNAPSHOT.jar:?]
	at java.base/java.lang.Thread.run(Thread.java:840) [?:?]
ERROR ExceptionUtil            2025-07-10T16:54:39,795     SearchService:runner : Exception detected and logged to mothership with error code IY4EHE
Additional exception info:
org.labkey.api.exceptions.TableNotFoundException: Table not found (deleted? race condition?): expsampleset.c15d210_generated_sample_type_1
org.labkey.api.exceptions.TableNotFoundException: Table not found (deleted? race condition?): expsampleset.c15d210_generated_sample_type_1
	at org.labkey.experiment.api.property.StorageProvisionerImpl.getSchemaTableInfo(StorageProvisionerImpl.java:589) ~[experiment-25.7-SNAPSHOT.jar:?]
	at org.labkey.experiment.api.property.StorageProvisionerImpl.getSchemaTableInfo(StorageProvisionerImpl.java:1074) ~[experiment-25.7-SNAPSHOT.jar:?]
	at org.labkey.experiment.api.property.StorageProvisionerImpl.createTableInfoImpl(StorageProvisionerImpl.java:576) ~[experiment-25.7-SNAPSHOT.jar:?]
	at org.labkey.api.exp.api.StorageProvisioner.createTableInfo(StorageProvisioner.java:68) ~[api-25.7-SNAPSHOT.jar:?]
	at org.labkey.experiment.api.ExpSampleTypeImpl.getTinfo(ExpSampleTypeImpl.java:950) ~[experiment-25.7-SNAPSHOT.jar:?]
	at org.labkey.experiment.api.ExpMaterialTableImpl.getFromSQLExpanded(ExpMaterialTableImpl.java:1046) ~[experiment-25.7-SNAPSHOT.jar:?]
	at org.labkey.api.data.AbstractTableInfo.getFromSQL(AbstractTableInfo.java:407) ~[api-25.7-SNAPSHOT.jar:?]
	at org.labkey.query.sql.QuerySelectView.getSelectSQL(QuerySelectView.java:341) ~[query-25.7-SNAPSHOT.jar:?]
	at org.labkey.query.sql.QuerySelectView.getSql(QuerySelectView.java:163) ~[query-25.7-SNAPSHOT.jar:?]
	at org.labkey.query.QueryServiceImpl$SelectBuilderImpl.buildSqlFragment(QueryServiceImpl.java:2863) ~[query-25.7-SNAPSHOT.jar:?]
	at org.labkey.query.QueryServiceImpl.getSelectSQL(QueryServiceImpl.java:2739) ~[query-25.7-SNAPSHOT.jar:?]
	at org.labkey.api.data.TableSelector$TableSqlFactory.getSql(TableSelector.java:644) ~[api-25.7-SNAPSHOT.jar:?]
	at org.labkey.api.data.SqlExecutingSelector$ExecutingResultSetFactory.handleResultSet(SqlExecutingSelector.java:379) ~[api-25.7-SNAPSHOT.jar:?]
	at org.labkey.api.data.SqlExecutingSelector.getResultSet(SqlExecutingSelector.java:191) ~[api-25.7-SNAPSHOT.jar:?]
	at org.labkey.api.data.TableSelector.getResults(TableSelector.java:372) ~[api-25.7-SNAPSHOT.jar:?]
	at org.labkey.api.data.TableSelector.getResults(TableSelector.java:355) ~[api-25.7-SNAPSHOT.jar:?]
	at org.labkey.api.data.TableSelector.getResults(TableSelector.java:344) ~[api-25.7-SNAPSHOT.jar:?]
	at org.labkey.experiment.api.AbstractRunItemImpl.processIndexValues(AbstractRunItemImpl.java:376) ~[experiment-25.7-SNAPSHOT.jar:?]
	at org.labkey.experiment.api.ExpMaterialImpl.getCustomIndexValues(ExpMaterialImpl.java:474) ~[experiment-25.7-SNAPSHOT.jar:?]
	at org.labkey.experiment.api.ExpMaterialImpl.createIndexDocument(ExpMaterialImpl.java:407) ~[experiment-25.7-SNAPSHOT.jar:?]
	at org.labkey.api.exp.api.ExpSearchable.lambda$index$0(ExpSearchable.java:41) ~[api-25.7-SNAPSHOT.jar:?]
	at org.labkey.api.data.DbScope._executeWithRetry(DbScope.java:1030) ~[api-25.7-SNAPSHOT.jar:?]
	at org.labkey.api.data.DbScope.executeWithRetryReadOnly(DbScope.java:984) ~[api-25.7-SNAPSHOT.jar:?]
	at org.labkey.api.exp.api.ExpSearchable.index(ExpSearchable.java:41) ~[api-25.7-SNAPSHOT.jar:?]
	at org.labkey.experiment.api.ExpMaterialTableImpl.lambda$persistRows$5(ExpMaterialTableImpl.java:1614) ~[experiment-25.7-SNAPSHOT.jar:?]
	at org.labkey.search.model.AbstractSearchService.lambda$new$6(AbstractSearchService.java:985) ~[search-25.7-SNAPSHOT.jar:?]
	at java.base/java.lang.Thread.run(Thread.java:840) [?:?]`

if (res == 0 && o != this)
res = (seqNum < o.seqNum ? -1 : 1);
return res;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice

@labkey-matthewb labkey-matthewb self-requested a review July 14, 2025 17:05
// queue an Item on every task
task.addRunnable(priority, () -> {
_tasks.forEach(t -> t.addNoop(priority, latch));
_defaultTask.addNoop(priority, latch);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems fine, but does TaskListener not work?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to test the TaskListener approach and not adding to every task last week after I had made the compareTo() look at the submission order. It didn't work.

Either I made a separate fix afterwards or wasn't testing what I thought I was, as it's working today. I switched back to the TaskListener and standard JDK latch implementation. I also added some debug-level logging to make this easier next time around.

@labkey-jeckels labkey-jeckels merged commit 2be10a6 into release25.7-SNAPSHOT Jul 15, 2025
11 checks passed
@labkey-jeckels labkey-jeckels deleted the 25.7_fb_drainIndexingRunnables branch July 15, 2025 18:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants